Serializing C intermediate representations for efficient and portable parsing

نویسندگان

  • Jeffrey A. Meister
  • Jeffrey S. Foster
  • Michael Hicks
چکیده

C static analysis tools often use intermediate representations (IRs) that organize program data in a simple, well-structured manner. However, the C parsers that create IRs are slow, and because they are difficult to write, only a few implementations exist, limiting the languages in which a C static analysis can be written. To solve these problems, we investigate two language-independent, on-disk representations of C IRs: one using XML, and the other using an Internet standard binary encoding called XDR. We benchmark the parsing speeds of both options, finding the XML to be about a factor of two slower than parsing C and the XDR over six times faster. Furthermore, we show that the XML files are far too large at 19 times the size of C source code, while XDR is only 2.2 times the C size. We also demonstrate the portability of our XDR system by presenting a C source code querying tool in Ruby. Our solution and the insights we gained from building it will be useful to analysis authors and other clients of C IRs. We have made our software freely available for download at http://www.cs.umd.edu/projects/PL/scil/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Serializing C intermediate representations for efficient and portable parsing ( preprint ) Jeffrey

C static analysis tools often use intermediate representations (IRs) that organize program data in a simple, well-structured manner. However, the C parsers that create IRs are slow, and because they are difficult to write, only a few implementations exist, limiting the languages in which a C static analysis can be written. To solve these problems, we investigate two language-independent, on-dis...

متن کامل

Serializing C intermediate representations for e cient and portable parsing ( preprint )

C static analysis tools often use intermediate representations (IRs) that organize program data in a simple, well-structured manner. However, the C parsers that create IRs are slow, and because they are di cult to write, only a few implementations exist, limiting the languages in which a C static analysis can be written. To solve these problems, we investigate two language-independent, on-disk ...

متن کامل

Serializing C Intermediate Representations to Promote Efficiency and Portability

C static analysis tools need access to intermediate representations (IRs) that organize program data in a well-structured manner. However, the C parsers that create IRs are slow, and they are not available for most languages. To solve these problems, we investigate two language-independent, on-disk representations of C IRs: one using XML, and the other using an Internet standard binary encoding...

متن کامل

Learning Representations for Text-level Discourse Parsing

In the proposed doctoral work we will design an end-to-end approach for the challenging NLP task of text-level discourse parsing. Instead of depending on mostly hand-engineered sparse features and independent components for each subtask, we propose a unified approach completely based on deep learning architectures. To train more expressive representations that capture communicative functions an...

متن کامل

Dependency Link Embeddings: Continuous Representations of Syntactic Substructures

We present a simple method to learn continuous representations of dependency substructures (links), with the motivation of directly working with higher-order, structured embeddings and their hidden relationships, and also to avoid the millions of sparse, template-based word-cluster features in dependency parsing. These link embeddings allow a significantly smaller and simpler set of unary featu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2010